• R generation

https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01169.x

R zor şeyler için kolay, kolay şeyler için zor

R Syntax Comparison::CHEAT SHEET

https://www.amelia.mn/Syntax-cheatsheet.pdf


R paketleri

Neden paketler var



https://blog.mitchelloharawild.com/blog/user-2018-feature-wall/



Kendi paket evrenini oluştur


R paket yükleme

install.packages("tidyverse", dependencies = TRUE)
install.packages("jmv", dependencies = TRUE)
install.packages("questionr", dependencies = TRUE)
install.packages("Rcmdr", dependencies = TRUE)
install.packages("summarytools")
# install.packages("tidyverse", dependencies = TRUE)
# install.packages("jmv", dependencies = TRUE)
# install.packages("questionr", dependencies = TRUE)
# install.packages("Rcmdr", dependencies = TRUE)
# install.packages("summarytools")
# require(tidyverse)
# require(jmv)
# require(questionr)
# library(summarytools)
# library(gganimate)

R için yardım bulma

# ?mean
# ??efetch
# help(merge)
# example(merge)
  • Vignette

https://stackoverflow.com/

  • Google uygun anahtar kelime

  • Google’da ararken [R] yazmak da işe yarayabiliyor.

  • searcher package 📦


http://cran.r-project.org/doc/contrib/Baggott-refcard-v2.pdf

https://www.rstudio.com/resources/cheatsheets/

  • Awesome R

https://github.com/qinwf/awesome-R#readme

https://awesome-r.com/

  • Twitter

https://twitter.com/hashtag/rstats?src=hash

  • Reproducible Examples

Veriyi görüntüleme

# library(nycflights13)
# summary(flights)
View(data)
data
head
tail
glimpse
str
skimr::skim()

Veriyi değiştirme

Veriyi kod ile değiştirelim

Veriyi eklentilerle değiştirme


RStudio aracılığıyla recode

questionr paketi kullanılacak


https://juba.github.io/questionr/articles/recoding_addins.html




Basit tanımlayıcı istatistikler

summary()
mean
median
min
max
sd
table()
library(readr)
# irisdata <- read_csv("data/iris.csv")

# jmv::descriptives(
#     data = irisdata,
#     vars = "Sepal.Length",
#     splitBy = "Species",
#     freq = TRUE,
#     hist = TRUE,
#     dens = TRUE,
#     bar = TRUE,
#     box = TRUE,
#     violin = TRUE,
#     dot = TRUE,
#     mode = TRUE,
#     sum = TRUE,
#     sd = TRUE,
#     variance = TRUE,
#     range = TRUE,
#     se = TRUE,
#     skew = TRUE,
#     kurt = TRUE,
#     quart = TRUE,
#     pcEqGr = TRUE)

# install.packages("scatr")

# scatr::scat(
#     data = irisdata,
#     x = "Sepal.Length",
#     y = "Sepal.Width",
#     group = "Species",
#     marg = "dens",
#     line = "linear",
#     se = TRUE)

summarytools

https://cran.r-project.org/web/packages/summarytools/vignettes/Introduction.html

library(summarytools)
Registered S3 method overwritten by 'pryr':
  method      from
  print.bytes Rcpp
summarytools::freq(iris$Species, style = "rmarkdown")

Frequencies

iris$Species

Type: Factor

  Freq % Valid % Valid Cum. % Total % Total Cum.
setosa 50 33.33 33.33 33.33 33.33
versicolor 50 33.33 66.67 33.33 66.67
virginica 50 33.33 100.00 33.33 100.00
<NA> 0 0.00 100.00
Total 150 100.00 100.00 100.00 100.00
summarytools::freq(iris$Species, report.nas = FALSE, style = "rmarkdown", omit.headings = TRUE)
'omit.headings' argument has been replaced by 'headings'; setting headings = FALSE
  Freq % % Cum.
setosa 50 33.33 33.33
versicolor 50 33.33 66.67
virginica 50 33.33 100.00
Total 150 100.00 100.00
with(tobacco, print(ctable(smoker, diseased), method = 'render'))

Cross-Tabulation, Row Proportions

smoker * diseased

Data Frame: tobacco
diseased
smoker Yes No Total
Yes 125 ( 41.9% ) 173 ( 58.1% ) 298 ( 100.0% )
No 99 ( 14.1% ) 603 ( 85.9% ) 702 ( 100.0% )
Total 224 ( 22.4% ) 776 ( 77.6% ) 1000 ( 100.0% )

Generated by summarytools 0.9.3 (R version 3.6.0)
2019-06-30

with(tobacco, 
     print(ctable(smoker, diseased, prop = 'n', totals = FALSE), 
           omit.headings = TRUE, method = "render"))
'omit.headings' will disappear in future releases; use 'headings' instead
diseased
smoker Yes No
Yes 125 173
No 99 603

Generated by summarytools 0.9.3 (R version 3.6.0)
2019-06-30

summarytools::descr(iris, style = "rmarkdown")
Non-numerical variable(s) ignored: Species

Descriptive Statistics

iris

N: 150

  Petal.Length Petal.Width Sepal.Length Sepal.Width
Mean 3.76 1.20 5.84 3.06
Std.Dev 1.77 0.76 0.83 0.44
Min 1.00 0.10 4.30 2.00
Q1 1.60 0.30 5.10 2.80
Median 4.35 1.30 5.80 3.00
Q3 5.10 1.80 6.40 3.30
Max 6.90 2.50 7.90 4.40
MAD 1.85 1.04 1.04 0.44
IQR 3.50 1.50 1.30 0.50
CV 0.47 0.64 0.14 0.14
Skewness -0.27 -0.10 0.31 0.31
SE.Skewness 0.20 0.20 0.20 0.20
Kurtosis -1.42 -1.36 -0.61 0.14
N.Valid 150.00 150.00 150.00 150.00
Pct.Valid 100.00 100.00 100.00 100.00
descr(iris, stats = c("mean", "sd", "min", "med", "max"), transpose = TRUE, 
      omit.headings = TRUE, style = "rmarkdown")
'omit.headings' argument has been replaced by 'headings'; setting headings = FALSE
Non-numerical variable(s) ignored: Species
  Mean Std.Dev Min Median Max
Petal.Length 3.76 1.77 1.00 4.35 6.90
Petal.Width 1.20 0.76 0.10 1.30 2.50
Sepal.Length 5.84 0.83 4.30 5.80 7.90
Sepal.Width 3.06 0.44 2.00 3.00 4.40
# view(dfSummary(iris))
dfSummary(tobacco, plain.ascii = FALSE, style = "grid")
text graphs are displayed; set 'tmp.img.dir' parameter to activate png graphs

Data Frame Summary

tobacco

Dimensions: 1000 x 9
Duplicates: 2

No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 gender
[factor]
1. F
2. M
489 (50.0%)
489 (50.0%)
IIIIIIIIII
IIIIIIIIII
978
(97.8%)
22
(2.2%)
2 age
[numeric]
Mean (sd) : 49.6 (18.3)
min < med < max:
18 < 50 < 80
IQR (CV) : 32 (0.4)
63 distinct values
.     .     . . . :
: : : : : . : : : :
: : : : : : : : : :
: : : : : : : : : :
: : : : : : : : : :
975
(97.5%)
25
(2.5%)
3 age.gr
[factor]
1. 18-34
2. 35-50
3. 51-70
4. 71 +
258 (26.5%)
241 (24.7%)
317 (32.5%)
159 (16.3%)
IIIII
IIII
IIIIII
III
975
(97.5%)
25
(2.5%)
4 BMI
[numeric]
Mean (sd) : 25.7 (4.5)
min < med < max:
8.8 < 25.6 < 39.4
IQR (CV) : 5.7 (0.2)
974 distinct values
          :
        : : :
        : : :
      : : : : :
    . : : : : : .
974
(97.4%)
26
(2.6%)
5 smoker
[factor]
1. Yes
2. No
298 (29.8%)
702 (70.2%)
IIIII
IIIIIIIIIIIIII
1000
(100%)
0
(0%)
6 cigs.per.day
[numeric]
Mean (sd) : 6.8 (11.9)
min < med < max:
0 < 0 < 40
IQR (CV) : 11 (1.8)
37 distinct values
:
:
:
:
:   . . . . . .
965
(96.5%)
35
(3.5%)
7 diseased
[factor]
1. Yes
2. No
224 (22.4%)
776 (77.6%)
IIII
IIIIIIIIIIIIIII
1000
(100%)
0
(0%)
8 disease
[character]
1. Hypertension
2. Cancer
3. Cholesterol
4. Heart
5. Pulmonary
6. Musculoskeletal
7. Diabetes
8. Hearing
9. Digestive
10. Hypotension
[ 3 others ]
36 (16.2%)
34 (15.3%)
21 ( 9.5%)
20 ( 9.0%)
20 ( 9.0%)
19 ( 8.6%)
14 ( 6.3%)
14 ( 6.3%)
12 ( 5.4%)
11 ( 5.0%)
21 ( 9.5%)
III
III
I
I
I
I
I
I
I

I
222
(22.2%)
778
(77.8%)
9 samp.wgts
[numeric]
Mean (sd) : 1 (0.1)
min < med < max:
0.9 < 1 < 1.1
IQR (CV) : 0.2 (0.1)
0.86!: 267 (26.7%)
1.04!: 249 (24.9%)
1.05!: 324 (32.4%)
1.06!: 160 (16.0%)
! rounded
IIIII
IIII
IIIIII
III

1000
(100%)
0
(0%)
# First save the results

iris_stats_by_species <- by(data = iris, 
                            INDICES = iris$Species, 
                            FUN = descr, stats = c("mean", "sd", "min", "med", "max"), 
                            transpose = TRUE)

# Then use view(), like so:

view(iris_stats_by_species, method = "pander", style = "rmarkdown")
Non-numerical variable(s) ignored: Species

Descriptive Statistics

iris

Group: Species = setosa
N: 50

  Mean Std.Dev Min Median Max
Petal.Length 1.46 0.17 1.00 1.50 1.90
Petal.Width 0.25 0.11 0.10 0.20 0.60
Sepal.Length 5.01 0.35 4.30 5.00 5.80
Sepal.Width 3.43 0.38 2.30 3.40 4.40

Group: Species = versicolor
N: 50

  Mean Std.Dev Min Median Max
Petal.Length 4.26 0.47 3.00 4.35 5.10
Petal.Width 1.33 0.20 1.00 1.30 1.80
Sepal.Length 5.94 0.52 4.90 5.90 7.00
Sepal.Width 2.77 0.31 2.00 2.80 3.40

Group: Species = virginica
N: 50

  Mean Std.Dev Min Median Max
Petal.Length 5.55 0.55 4.50 5.55 6.90
Petal.Width 2.03 0.27 1.40 2.00 2.50
Sepal.Length 6.59 0.64 4.90 6.50 7.90
Sepal.Width 2.97 0.32 2.20 3.00 3.80
# view(iris_stats_by_species)

data(tobacco) # tobacco is an example dataframe included in the package
BMI_by_age <- with(tobacco, 
                   by(BMI, age.gr, descr, 
                      stats = c("mean", "sd", "min", "med", "max")))
view(BMI_by_age, "pander", style = "rmarkdown")

Descriptive Statistics

BMI by age.gr

Data Frame: tobacco
N: 258

  18-34 35-50 51-70 71 +
Mean 23.84 25.11 26.91 27.45
Std.Dev 4.23 4.34 4.26 4.37
Min 8.83 10.35 9.01 16.36
Median 24.04 25.11 26.77 27.52
Max 34.84 39.44 39.21 38.37

BMI_by_age <- with(tobacco, 
                   by(BMI, age.gr, descr,  transpose = TRUE,
                      stats = c("mean", "sd", "min", "med", "max")))

view(BMI_by_age, "pander", style = "rmarkdown", omit.headings = TRUE)
'omit.headings' will disappear in future releases; use 'headings' instead
  Mean Std.Dev Min Median Max
18-34 23.84 4.23 8.83 24.04 34.84
35-50 25.11 4.34 10.35 25.11 39.44
51-70 26.91 4.26 9.01 26.77 39.21
71 + 27.45 4.37 16.36 27.52 38.37

tobacco_subset <- tobacco[ ,c("gender", "age.gr", "smoker")]
freq_tables <- lapply(tobacco_subset, freq)

# view(freq_tables, footnote = NA, file = 'freq-tables.html')

what.is(iris)

$properties property value 1 class data.frame 2 typeof list 3 mode list 4 storage.mode list 5 dim 150 x 5 6 length 5 7 is.object TRUE 8 object.type S3 9 object.size 7256 Bytes

$attributes.lengths names class row.names 5 1 150

$extensive.is [1] “is.data.frame” “is.list” “is.object” “is.recursive” [5] “is.unsorted”


freq(tobacco$gender, style = 'rmarkdown')
## ### Frequencies  
## #### tobacco$gender  
## **Type:** Factor  
## 
## |     &nbsp; | Freq | % Valid | % Valid Cum. | % Total | % Total Cum. |
## |-----------:|-----:|--------:|-------------:|--------:|-------------:|
## |      **F** |  489 |   50.00 |        50.00 |   48.90 |        48.90 |
## |      **M** |  489 |   50.00 |       100.00 |   48.90 |        97.80 |
## | **\<NA\>** |   22 |         |              |    2.20 |       100.00 |
## |  **Total** | 1000 |  100.00 |       100.00 |  100.00 |       100.00 |

print(freq(tobacco$gender), method = 'render')

Frequencies

tobacco$gender

Type: Factor
Valid Total
gender Freq % % Cum. % % Cum.
F 489 50.00 50.00 48.90 48.90
M 489 50.00 100.00 48.90 97.80
<NA> 22 2.20 100.00
Total 1000 100.00 100.00 100.00 100.00

Generated by summarytools 0.9.3 (R version 3.6.0)
2019-06-30


skimr

library(skimr)
skim(df)

DataExplorer

library(DataExplorer)
DataExplorer::create_report(df)


Grafikler

# library(ggplot2)
# library(mosaic)
# mPlot(irisdata)

ctable(tobacco$gender, tobacco$smoker, style = 'rmarkdown')

Cross-Tabulation, Row Proportions

gender * smoker

Data Frame: tobacco

smoker Yes No Total
gender
F 147 (30.1%) 342 (69.9%) 489 (100.0%)
M 143 (29.2%) 346 (70.8%) 489 (100.0%)
<NA> 8 (36.4%) 14 (63.6%) 22 (100.0%)
Total 298 (29.8%) 702 (70.2%) 1000 (100.0%)

print(ctable(tobacco$gender, tobacco$smoker), method = 'render')

Cross-Tabulation, Row Proportions

gender * smoker

Data Frame: tobacco
smoker
gender Yes No Total
F 147 ( 30.1% ) 342 ( 69.9% ) 489 ( 100.0% )
M 143 ( 29.2% ) 346 ( 70.8% ) 489 ( 100.0% )
<NA> 8 ( 36.4% ) 14 ( 63.6% ) 22 ( 100.0% )
Total 298 ( 29.8% ) 702 ( 70.2% ) 1000 ( 100.0% )

Generated by summarytools 0.9.3 (R version 3.6.0)
2019-06-30

descr(tobacco, style = 'rmarkdown')

print(descr(tobacco), method = 'render', table.classes = 'st-small')

dfSummary(tobacco, style = 'grid', plain.ascii = FALSE)

print(dfSummary(tobacco, graph.magnif = 0.75), method = 'render')



Tablolar


Bazı arayüzler

Link

Rcmdr

library(Rcmdr)

Rcmdr::Commander()
  • A Comparative Review of the R Commander GUI for R

http://r4stats.com/articles/software-reviews/r-commander/


R nereden öğrenilir


Sonraki Konular

  • RStudio ile GitHub
  • Hipotez testleri
  • R Markdown ve R Notebook ile tekrarlanabilir rapor

Geri Bildirim




# Save Final Data

saved data after analysis to `Data-After-Analysis.xlsx`.

saveRDS(mydata, "Data-After-Analysis.rds")

writexl::write_xlsx(mydata, "Data-After-Analysis.xlsx")

file.info("Data-After-Analysis.xlsx")$ctime

Libraries Used

citation()
## 
## To cite R in publications use:
## 
##   R Core Team (2019). R: A language and environment for
##   statistical computing. R Foundation for Statistical Computing,
##   Vienna, Austria. URL https://www.R-project.org/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {R: A Language and Environment for Statistical Computing},
##     author = {{R Core Team}},
##     organization = {R Foundation for Statistical Computing},
##     address = {Vienna, Austria},
##     year = {2019},
##     url = {https://www.R-project.org/},
##   }
## 
## We have invested a lot of time and effort in creating R, please
## cite it when using it for data analysis. See also
## 'citation("pkgname")' for citing R packages.
citation("tidyverse")
citation("foreign")
citation("tidylog")
citation("janitor")
citation("jmv")
citation("tangram")
citation("finalfit")
citation("summarytools")
citation("ggstatplot")
citation("readxl")

report::cite_packages(session = sessionInfo())
                                                                                                                                               References

1 Dominic Comtois (2019). summarytools: Tools to Quickly and Neatly Summarize Data. R package version 0.9.3. https://CRAN.R-project.org/package=summarytools 2 Hadley Wickham, Jim Hester and Romain Francois (2018). readr: Read Rectangular Text Data. R package version 1.3.1. https://CRAN.R-project.org/package=readr


sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] summarytools_0.9.3 readr_1.3.1       
## 
## loaded via a namespace (and not attached):
##  [1] zoo_1.8-6           tidyselect_0.2.5    report_0.1.0       
##  [4] xfun_0.8            performance_0.2.0   purrr_0.3.2        
##  [7] pander_0.6.3        splines_3.6.0       lattice_0.20-38    
## [10] parameters_0.1.0    tcltk_3.6.0         vctrs_0.1.99.9000  
## [13] htmltools_0.3.6     yaml_2.2.0          survival_2.44-1.1  
## [16] rlang_0.4.0.9000    pillar_1.4.1        glue_1.3.1         
## [19] estimate_0.1.0      pryr_0.1.4          matrixStats_0.54.0 
## [22] emmeans_1.3.5.1     multcomp_1.4-10     plyr_1.8.4         
## [25] stringr_1.4.0       bayestestR_0.2.2    mvtnorm_1.0-11     
## [28] codetools_0.2-16    coda_0.19-2         evaluate_0.14      
## [31] knitr_1.23          TH.data_1.0-10      Rcpp_1.0.1         
## [34] xtable_1.8-4        backports_1.1.4     checkmate_1.9.3    
## [37] magick_2.0          rapportools_1.0     hms_0.4.2          
## [40] digest_0.6.19       stringi_1.4.3       insight_0.3.0      
## [43] dplyr_0.8.1         grid_3.6.0          tools_3.6.0        
## [46] bitops_1.0-6        sandwich_2.5-1      magrittr_1.5       
## [49] RCurl_1.95-4.12     tibble_2.1.3        crayon_1.3.4       
## [52] tidyr_0.8.3.9000    pkgconfig_2.0.2     zeallot_0.1.0      
## [55] MASS_7.3-51.4       ellipsis_0.2.0.9000 Matrix_1.2-17      
## [58] correlation_0.1.0   estimability_1.3    lubridate_1.7.4    
## [61] assertthat_0.2.1    rmarkdown_1.13      boot_1.3-22        
## [64] R6_2.4.0            compiler_3.6.0

Notes

Completed on 2019-06-30 19:37:12.

Serdar Balci, MD, Pathologist
d
https://rpubs.com/sbalci/CV

https://sbalci.github.io/
https://github.com/sbalci


  1. Bu bir derlemedir, mümkün mertebe alıntılara referans vermeye çalıştım.